On the saddle point problem for non-convex optimization
نویسندگان
چکیده
A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, and neural network theory, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new algorithm, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep neural network training, and provide preliminary numerical evidence for its superior performance.
منابع مشابه
Communication-Efficient Distributed Primal-Dual Algorithm for Saddle Point Problem
Primal-dual algorithms, which are proposed to solve reformulated convex-concave saddle point problems, have been proven to be effective for solving a generic class of convex optimization problems, especially when the problems are ill-conditioned. However, the saddle point problem still lacks a distributed optimization framework where primal-dual algorithms can be employed. In this paper, we pro...
متن کاملCharged Point Normalization: An Efficient Solution to the Saddle Point Problem
Recently, the problem of local minima in very high dimensional non-convex optimization has been challenged and the problem of saddle points has been introduced. This paper introduces a dynamic type of normalization that forces the system to escape saddle points. Unlike other saddle point escaping algorithms, second order information is not utilized, and the system can be trained with an arbitra...
متن کاملEscaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition
We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in saddle points. In this paper we identify strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that from an arbit...
متن کاملCanonical Primal-Dual Method for Solving Non-convex Minimization Problems
A new primal-dual algorithm is presented for solving a class of non-convex minimization problems. This algorithm is based on canonical duality theory such that the original non-convex minimization problem is first reformulated as a convex-concave saddle point optimization problem, which is then solved by a quadratically perturbed primal-dual method. Numerical examples are illustrated. Comparing...
متن کاملParticle Swarm Optimization with Smart Inertia Factor for Combined Heat and Power Economic Dispatch
In this paper particle swarm optimization with smart inertia factor (PSO-SIF) algorithm is proposed to solve combined heat and power economic dispatch (CHPED) problem. The CHPED problem is one of the most important problems in power systems and is a challenging non-convex and non-linear optimization problem. The aim of solving CHPED problem is to determine optimal heat and power of generating u...
متن کاملOn the convergence of conditional epsilon-subgradient methods for convex programs and convex-concave saddle-point problems
The paper provides two contributions. First, we present new convergence results for conditional e-subgradient algorithms for general convex programs. The results obtained here extend the classical ones by Polyak [Sov. Math. Doklady 8 (1967) 593; USSR Comput. Math. Math. Phys. 9 (1969) 14; Introduction to Optimization, Optimization Software, New York, 1987] as well as the recent ones in [Math. P...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1405.4604 شماره
صفحات -
تاریخ انتشار 2014